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Abstract This paper investigates the use of stratified sampling as a variance reduction technique for 
approximating integrals over large dimensional spaces. The accuracy of this method critically depends on 
the choice of the space partition, the strata, which should be ideally fitted to the subsets where the functions 
to integrate is nearly constant, and on the allocation of the number of samples within each strata. When 
the dimension is large and the function to integrate is complex, finding such partitions and allocating the 
sample is a highly non-trivial problem. In this work, we investigate a novel method to improve the efficiency 
of the estimator "on the fly", by jointly sampling and adapting the strata which are hyperrectangles and 
the allocation within the strata. The accuracy of estimators when this method is used is examined in 
detail, in the so-called asymptotic regime {i.e. when both the number of samples and the number of strata 
are large). It turns out that the limiting variance depends on the directions defining the hyperrectangles 
but not on the precise abscissae of their boundaries along these directions, which gives a mathematical 
justification to the common choice of equiprobable strata. So, only the directions are adaptively modified 
by our algorithm. We illustrate the use of the method for the computation of the price of path-dependent 
options in models with both constant and stochastic volatility. The use of this adaptive technique yields 
variance reduction by factors sometimes larger than 1000 compared to classical Monte Carlo estimators. 
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1 Introduction 

A number of problems in statistics, operation research and mathematical finance boils down to the evalu- 
ation of the expectation (or higher order moments) of a random variable 4'0^)^ known to be a complicated 
real valued function of a vector Y = (Vi, . . . , Y^) of independent random variables. In our applications, we 
will mainly focus on simulations driven by a sequence of independent standard normal random variables, in 
situations where the dimension d is very large. Such problems arise in particular in computational finance 
for the pricing of path-dependent options, either when the number of underlying assets is large, or when 
additional source of randomness is present such as in stochastic volatility models. 

The stratification approach consists in dissecting R'' into mutually exclusive strata and ensuring that 
(f) is evaluated for a prescribed and appropriate number of points in each stratum (see (S), 

The main purpose of this paper is to discuss a way of dissecting the space into strata and allocating 
the random draws in the strata, adapted to the case where V is a standard Gaussian vector. We also 
address the accuracy of estimators when this method of sampling is used, and give conditions upon which 
the variance reduction is most effective. 

Our method makes use of orthogonal directions, to induce a dissection of R"^ with the right property. 
These directions and the associated allocation are learnt adaptively, while the simulations are performed. 
The advantage of the adaptive method, similar to those introduced for importance sampling by (fisl ) is 
that information is collected as the simulations are done, and computations of means and variances of 
(/)(y) in strata are used to update the choice of these strata and of the allocation. We investigate in some 
details the asymptotic regime i. e. where the number of simulations and the number of the strata both go 
to infinity. 

The method is illustrated for pricing path-dependent options driven by high-dimensional Gaussian 
vectors, combining importance sampling based on a change of drift together with the suggested adaptive 
stratification. The combination of these two methods, already advocated in an earlier work by is 
very effective; nevertheless, these examples show that, contrary to what is suggested in this work, the 
asymptotical optimal drift vector is not always the most effective direction of stratification. 

The paper is organized as follows. In section [2] an introduction to the main ideas of the stratification 
is presented. Section |3] addresses the behavior of the stratified estimator in the asymptotic regime {i.e. 
when both the number of samples and the number of strata go to infinity). The roles of the stratification 
directions, of the strata boundaries in each direction of stratification and of the allocation within each 
strata are evidenced. In section |4l an algorithm is proposed to adapt the directions of stratification and the 
allocation of simulations within each stratum. In Section O the proposed adaptive stratification procedure 
is illustrated using applications for the pricing of path-dependent options. 



2 An introduction to stratification 



Suppose we want to compute an expectation of the form E [(/'(5^)] where 
function and F is a R'^-valued random variable. We assume hereafter that 



E 



< +00 



is a measurable 



(1) 



We consider a stratification variable of the form fJ/^Y where /i is an orthonormal (d x m) matrix with 
m < d. Given a finite partition {Sj, i G T} of R™, the sample space R'^ 
by 

It is assumed in the sequel that the probability of the strata {p\, i G T} 



the sample space R of F is divided into strata defined 

(2) 



(3) 



are known which is the case when F is a standard Gaussian vector and the Si are hyperrectangles. Up to 
removing some strata, we may assume without loss of generality that Pi(^) > 0, for any i G T. 

Let M be the total number of draws and Q = {qi,i G T} be an allocation vector (i.e. gi > and 
EieiQi = 1) • the number M\ of samples allocated to the i-th stratum is given by 



i G X , 



(4) 
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where [-J denotes the lower integer part and by convention, "^f^ 9j = (it is assumed that the set of indices 
I is totally ordered) . If the number of points in each stratum is chosen to be proportional to the probability 
of the strata, the allocation is said to be proportional. Given the strata {Si,i G 1} and the allocation Q, 
the stratified estimator with M draws is defined by 

ieI:Mi>0 [ ' j=l J 

where {Yi j,j < M;,! G T} are independent random variables with j distributed according to the 
conditional distribution P [fg -iM^y G Si] fori<Afi. 

The stratified estimator is an unbiased estimator of E[(j!>(y)] if the Mi's are all positive (a sufficient 
condition is M > l/miuigi). Its variance is given by X]iGX j\/->o ^i~^pf (a')''"?(m) where cr?(/i) is the 
conditional variance of the random vector (piY) given fi'^Y G Si, 

afif,) E [4>\y) I G Si] - (e [0(y) I e S;])' . (6) 

When M goes to infinity and the number of strata is either fixed or goes to infinity slowly enough, the 
variance of the stratified estimator is equivalent to 12i^x-q>o 1^^ PiM'^iM (see Lemma[l}. The 

two key questions that arise in every application of the stratified sampling method are (i) the choice of 
the dissection of the space and (ii) for a fixed M, the determination of the number of samples to be 
generated in each stratum i. The optimal allocation minimizing the above asymptotic variance subject to 
the constraint X^iex 9i = 1 is given by : 

For a given stratification matrix fi, we refer to Q*(/i) ~ {q^{fi),i G X} as the optimal stratification 
vector. Of course, contrary to the proportions PiifJ-), the conditional expectations E [(j>{Y) \ Y G S^ i] are 
unknown and so are the conditional variances af{fi). 

The simplest approach would be to estimate these conditional variances in a pilot run, to determine 
the optimal stratification matrix and the optimal allocation vector from these estimates, and then to use 
them in a second stage to determine the stratified estimator. Such a procedure is clearly suboptimal, 
since the results obtained in the pilot step are not fully exploited. This calls for a more sophisticated 
procedure, in the spirit of those used for adaptive importance sampling; see for example, (fisl l and (flil l. In 
these algorithms, the estimate of conditional variance and the stratification directions is gradually improved 
while computing the stratified estimator and estimating its variance. Such algorithm extends the procedure 
by (0), who proposed to adaptively learn the optimal allocation vector for a set of given strata and derived 
a central limit theorem for the adaptive estimator (with the optimal asymptotic variance). 



3 Asymptotic analysis of the stratification performance 

We derive in this Section the asymptotic variance of the stratified estimator when both the total number 
of draws M and the number of strata (possibly depending upon M) tend to +oo. The variance of the 
estimator depends on the stratification matrix /i, on the partition {Si, i G X} of the sample space of fi'^Y 
and on the allocation Q. 

For any integer k, we denote by A the Lebesgue measure on R*", equipped with its Borel sigma-field 
(the dependence in the dimension k is implicit). For a probability density h w.r.t the Lebesgue measure 
on R, we denote by H its cumulative distribution function, and its quantile function, defined as the 

generalized inverse of H , 

H~^{u) = inf{a: G {-ff > 0} : H{x) > u} , for any u G [0, 1] , 

where, by convention, inf0 = +oo. Let / be a positive integer. The choice of the strata boundaries is 
parameterized by an m-uplet {gi,...,gm) of probability densities on R in the following sense: for all 
m-uplet i = (ii, . . .,im) G {1, • • • , /}™, 



(8) 
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We denote by g{xi, . . . ,Xm) IlI'Li 9k{^k) the associated joint density. Let fj, he a. d x m ortlionor- 
mal matrix. We consider the stratification S{fi) = {S^i,i G {!,..., /}'"} of the space R"^. Denote by 
'^1 2) the asymptotic variance of the stratified estimator, given by 

?/.m(m,5, Q) =^ Yl p?{^)af{n) , (9) 

ie{l,...,/}'":A/i>0 

where the number of draws M; is given by Q and p\{fi), a?{^), the probability and the conditional variance 
are given by (|3}, and ((6)1, respectively. The dependence w.r.t. g and Q of Af;, Pi(/i) and crfdi) is implicit. 

We consider allocation vectors Qy^ = |gi(x) J^. X d\ , i € {1, • • . , ^}™| parameterized by a proba- 
bility density x ■ K"' R-i-. We assume that the random variable i-i^Y possesses a density w.r.t. the 
Lebesgue measure (on R™). We consider the functions 

i,f,{x) =^ E [^f. (Y) j /i^F = a-] , and C,^{x) =^ E [.^^(F) | fi'^Y = . 

Using these notations, the asymptotic variance of the stratified estimator may be rewritten as 

We will investigate the limiting behavior of asymptotic variance c| }^j{fi,g, Q^^) when the total number of 
samples M and the number of strata / both tend to +00. For that purpose, some technical conditions 
are required. For v a measure on R™ and h a real- valued measurable function on R™, we denote by 
essinf;/ (h) and esssup^ (ft) the essential infimum and supremum w.r.t. the measure u. From now on we use 
the following convention : z/0 is equal to -|-cxj if 2 > and to if 2 = 0. 

/b-" X^/gdX < +00 and essinfg.A (x/ff) > 0. 
A2 for h e {.fM,CA'/M.V'A'/A'}. /m'" h^/g dX < +00. 

Under A[2l A-a.e. , g — implies that = 0. Finally, a reinforced integrability condition is needed 

A3 /r,„ /^(Cm - i'DVix^g] dX < 

When m < d, we establish the expression of the limit as the number of strata / goes to +00 of the limiting 
variance (as the number of simulations M goes to -l-oo) of the stratified estimator. Define 

d(/i, X) =' / /'(Cm - i>l)/x dX . (10) 



Proposition 1 Let m be an integer such that m < d, ,gm be probability density functions (pdf) 

w.r.t. to the Lebesgue measure of M., jj. be a d x m orthonormal matrix, and x be a pdf w.r.t. the Lebesgue 
measure on R™. Assume that g and x satisfy assumptions j^H-^'Q Then 

lirn lim Ahj ^if^^g, Qx) = ^^ip,x) ■ 

Assume in addition one of the following conditions 

(i) esssup^._)^ iffi/x) < +00 and {/a/, M > 1} is an integer-valued sequence such that I^j^ + I^jM^^ —> 
as M goes to infinity. 

(ii) {Im,M > 1} is an integer-valued sequence such that I^j + Ij^'PAI~ as M goes to infinity. 



Then. 



lim Mcj M{lJ',g, Qx) = do{p,x) 
A/— v-)-oo 
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The proof is given in Section [6Tl1 It is worthwhile to note that the limiting variance of the stratified estimator 
x) does not depend on the densities (gi, . . . , gm) that define the strata : only the stratification matrix 
/i and the allocation vector Q^^ enters in the limit. The contribution to the variance of the randomness 
in the directions orthogonal to the rows of /i dominates at the first order. In practice, this means that 
asymptotically, once the directions of stratification are chosen, the choice of the strata is irrelevant; the 
usual choice of gi as the distribution of the i-th component of the random vector fi'^Y, i £ {1, . . . , m} is 
asymptotically optimal. 

On the contrary, the limiting variance <;^(/i, x) depends on the allocation density x- For 8- given 
value of the stratification directions /i, it is possible to minimize the function x ?«d(a'i x)- Assume that 
/r™ U \/Cm-V'2 d\ > 0. Since /g„ y^C^ - iPf, d\ = E [y^Var [<^(y)lA*'?^y]J < VVar(<^(y)), the integral 
is finite by (O and it is possible to define a density x^ by 

='^\/Cm-^m/ / ^^^^|Z^^^d\. (11) 



Then xli is the minimum of x i-^ (m, x) ^'Hd the minimal limiting variance is 

d(M,XM) = (^^ /m\/Cm-V^m = (e y^Var [<^>{Y)\^,TY] 

Provided X41 satisfies assumptions A[l][2] (note that in that case, A|3] is automatically satisfied) , the choice 
X = x^ for the allocation of the drawings in the strata is asymptotically optimal. 

Remark 1 ^4?! expression of the limiting variance ?^(/i, x) been obtained in Lemma 4-1) in the 
case m = 1 and for the proportional allocation rule which corresponds to x = ffj.- shown by these 

authors that the limiting variance is E(Var[<^(y)|^^y]) which is equal to '^'^{fJ,, ffi) (note that in this 
case the stratification density g = satisfies the assumptions J^TI[^ provided that E[0^(y)] < 00). Unless 
Var[(j!.(y)|/i^r] IS a.s. constant, the asymptotic variance is strictly smaller for the optimal choice of the 
allocation density. 

The optimal allocation density Xfj. cannot in general be computed explicitly but, as shown in the following 
Proposition, can be approximated by computing the optimal allocation within each stratum. 

Proposition 2 Let m < d be an integer and /i be an {d x m) orthonormal matrix. Assume that is 
satisfied. Then, 

" ^ =0, 



lim V qtM - / Xm 

^ — ^ /c. 

ie{i,. ..,/}" 



where Q*{fJ-) {q^{ii), i G {1, . . . , /}'"} is given by Qj. Let {/a/, M > 1} be an integer-valued sequence 
such that I^j^ + I™jM^^ as M goes to infinity. Then, 

lim Mqj m{i^,9,Q*{iA) =?^(m,X^) ■ 
M— *-t-oc 

The proof is given in Section 16.11 As the number of strata goes to infinity, the stratified estimator run 
with the optimal allocation Q*(/i) has the same asymptotic variance as the stratified estimator run with 
the allocation deduced from the optimal density x^- In practice, of course, the optimal allocation Q*{fJ.) 
is unknown, but it is possible to construct an estimator of this quantity by estimating the conditional 
variance of Var[0(y)|^"^y G S;] within each stratum Q). 

Remark 1 When m = d, the results obtained are markedly different since the accuracy of the strati- 
fied estimator now depends on the definition of the strata along each direction. Let (t>i_i{x) <l){^Fx), 
dj-cpfj, be the partial derivative of </)^ w.r.t. its fc-th coordinate for k £ {1, . . . ,d}. Let g^. still denote the 
function x = {xi,...,Xii) G R*^ 1— > gki^k)- Assuming A[T1 esssup )^{ffj,/g) < +00 and G satisfies 
esssup_\ (^^=1 \9k4>ti\/ < +00, one checks in (0) that for any integer-valued sequence {Im,M > 1} 

such that limA/^00 {l^l + lif^^i^^^ = 0, 

hm M/|,,L.,,(^„g,Q,)=d(M,5,x) = ^/ |E(^)'dA. 



In addition, limM^+00 Mllj<;j^^ g, Q*(^i)) = ^^(m, 5, X^,g) with xl,g oc fp.\/T,k=l 
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4 An adaptive stratification algorithm 

As shown in the asymptotic theory presented above, under optimal allocation, it is more important to 
optimize the stratification matrix p than the strata boundaries along each stratification directional. Propo- 
sition [l] suggests the following strategy: the "optimal" matrix ^* is defined as a minimizer of the limiting 
variance fj, ^^(MiXm)- course, this optimization problem does not have a closed form expression 
because the functions x i— > ip^{x), x i— > Cm(^) a-re not available. 

We rather use the characterization of as the limiting variance of the stratified estimator 

with optimal allocation given in Proposition [J] The problem boils down to search for a minimizer /i of the 
variance <ij mIi^' 9' M)- '^u'' applications, F is a d-dimensional standard normal vector, and fJ.'^Y 
is a m-dimensional standard Gaussian vector. In this case, we set g^, i = {1, . . . , m} to be the standard 
Gaussian distribution so that the strata boundaries in each directions are the quantiles of the standard 
normal variable. Since <;oo(a'iXm) does not depend on g, the impact of this convenient choice, which leads 
to equiprobable strata for the vector fi'^Y, should be limited. 

Of course, the optimization of <:j j^i{fi,g, Q*{ii)) is a difficult task because in particular the definition 
of this function involves multidimensional integrals, which cannot be computed with high accuracy. Note 
also that, in most situations, the optimization should be done in parallel to the main objective, namely, 
the estimation of the quantity of interest E[(;/)(y)], which is obtained using a stratified estimator based on 
the adaptively defined directions of stratification ^t. The adaptive stratification is analog to the popular 
adaptive importance sampling; see for example Hi), (0), HO), and HI). 

When the function to minimize is an expectation, the classical approaches to tackle this problem are 
based on Monte Carlo approximations for the integrals appearing in the expression of the objective function 
and its gradients. There are typically two approaches to Monte Carlo methods, the stochastic approximation 
procedure and the sample average approximation method; see (0). In the adaptive stratification context, 
these Monte Carlo estimators are based on the current fit of the stratification matrix and of the conditional 
variances within each stratum, the underlying idea being that the algorithm is able to progressively learn 
the optimal stratification, while the stratified estimator is constructed. 

The algorithm described here is closely related to the sample average approximation method, the main 
difference with the classical approach being that, at every time a new search direction is computed, a new 
Monte Carlo sample (using the current fit of the strata and of the allocation) is drawn. 

Suppose that Y admits a density w.r.t. the Lebesgue measure denoted by /. Define for i G {1, • ■ • , I}™, 
a function h G {/, (j>f, <f>'^ f} , and an orthonormal d x m matrix ^, 

d f f /■ " 

Uiih,^,) ^ hdX^ H l[y^G-\{i,-l)/I)<{^„y)<G-\i,/I)}'"^^ ^ (12) 

where {x, y) denotes the scalar product of the vectors x and y and /xj. the fc-th column of ^. Using the 
definition of u-^, the proportions Pi(/i) and the conditional variances in each stratum cr?(^) respectively 
given by ^ and © may be expressed as, when i'i(/, /i) > 0, 

Pi(M) = .i(/,M), and a,(^) = _^^-(^_^J . (13) 

When M is large and I is fixed, minimizing the asymptotic variance of the stratified estimate with optimal 
allocation is equivalent to minimize V{ii) w.r.t. the stratification matrix jx where (see Lemma [1} 

i=l i=l 

Assuming that the functions /i i-^ Vi{h, fi) are differentiable at /i for h G {/, /</), f'fP'}, the gradient may be 
expressed as 

(14) 



^ Of course, tiiis is an asymptotic result, but our numerical experiments suggest that optimizing the strata 
boundaries along each stratification direction does not lead to a significant reduction of the variance. This is why 
we concentrate on the optimization of the stratification matrix 
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The computation of this gradient thus requires to calculate i^i{h, ^) for h G {/, f(p, f4>^}. For a vector 
7^ 0, and z £ K, define A^, the restriction of the Lebesgue measure on the hyperplane {y G 
R^{/.,y) = 2}. 

Proposition 3 Let 2 G R, ft : R'' M. be a locally bounded integrable real function, gz : R'' 3 i-^ 
J l{y^{u,y)<z}h{y) dX{y) and ji G R'* be a non-zero vector. Assume that h is continuous almost 
everywhere and that there exists e > such that 



lim sup I \y\l!\y\>M}\Hy)\d\"{y) . (15) 

Then, the function v i— > gz(y) is differentiable at n and V^^ 5z(m) ~ ~ J jj^ h[y) dXz{y). 

Corollary 1 ^ssame i/iat h is a real locally bounded integrable function. Let m be an integer, z — 
{zi,...,Zm) G R", gz ■■ R'^''™ 9 {ul,...,v„^) >^ /n)r=l ^{y,{y^,,y)<z^}Ky) d\[y) and ^ [w,...,^™] G 
jjdxm ^ rank matrix. Assume that h is continuous X^^Li -^z*" almost everywhere and that there 
exists e > such that, for any k G {l,...,m}, limM^+oo sup|j,_^j^|<g / iy|l||j^|>j,ji,|/i(j/)| dWiiy) = 0. 
Then, gz is differentiable at and the differential V^gz is given by V ^gz = [V^^ (?z, • • ■ , V^^g^;], where 



The algorithm goes as follows. Denote by {74} a sequence of stepsizes. Consider the strata {Si,i G 
{1, ■ ■ • , /}"'} given by ((8} for some product density g. 

1. Initialization. Choose initial stratification directions jj.^^"^ and an initial number of draws in each 
statum M^") =^ {M^^^i G=' {1,...,/}™} such that Ei ^'-^/"^ = M. Compute the probabilities 
PiCM^"^) of each stratum. 

2. Iteration. At iteration t + 1, given M(*) and i G {1, ■ • • ,/}'"}, 

(a) Compute W(^i(*)); 

(i) for i G {I,-- - draw M.'-*-' realizations of i.i.d. random variables {Y^^^,k < Afp^} 

with distribution ¥{Y G ■]¥ G S^co^) and evaluate i>l*~^^\h) = ^''^fw^ T,kll ^ (^/fc ) 
for h G {0,0^}. 

(ii) for k G {I,--- ,m}, s G {0^:^(1//), •■ • ,G^^((/- 1)//)}, draw M^*] realizations of i.i.d. 

ft) T 

random variables with distribution V{Y G • | [fi). ] Y = s). Compute a Monte Carlo estimate 

of V^i/i(/i, M^*)) for h G {/, /(;/>, f4?} based on Corollary [T] 
(iii) Compute a Monte Carlo estimate of based on the expression (|14p . 

(b) Update the direction of stratification: Set /i = /i^*-* —74 VF(/x^*-'); define as the orthonor- 
mal matrix found by computing the singular value decomposition of fi and keeping the m left 
singular vectors. 

(c) Update the allocation policy: 

(i) compute an estimate (t|*^^'' of the standard deviation within stratum i 

/ , s , s 9\ 1/2 

^(t+l) _ ( ^'i (-^ ) / '^i ("J^) 



(ii) Update the allocation vector 




Eje{i,...,7}™Pj(M«)-i*+'^ 



and the number of draws {M^^~^^\ \ G {1, . . . ,/}™} by applying the formula with a 
total number of draws equal to M. 
(d) Update the probabilities Pi{iJ.^*~^^^), i G {I,-- - ,/}'". 
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(e) Compute an averaged stratified estimate of the quantity of interest: Estimate the Monte Carlo 
variance of the stratified estimator for the current fit of tlie strata and tlie optimal allocation 



2i(t+l) 1 I / (*)^ -(t+1) 

\ie{ir-J}™ / 



Compute the current fit of the stratified estimator by the following weighted average 



t+l , \ 1 t+l 



\t = 1 1 / r = l J 

There are two options to choose the stepsizes {74, t > 0}. The traditional approach consists in taking a 
decreasing sequence satisfying the following conditions (see for example \Vu)) 

^ 7t = +00 , $Z 7? < +00 . 

t>0 t>0 

If the number of simulations is fixed in advance, say equal to A'', then one can use a constant stepsize 
strategy, i.e. choose 74 = 7 for all t £ {1, . . . , A'^}. As advocated in a sensible choice in this setting 
is to take 74 proportional to AT ' . The convergence of this crude gradient algorithm proved to be quite 
fast in all our applications, so it is not required to resort to computationally involved alternatives. 



Step 2(a)ii is specific to the optimization problem to solve and is not related to the stratification 
itself. The number of draws for the computation of the surface integral (see Corollary [ij can be chosen 
independently of the allocation When the samples in steps |2(a)i| and |2(a)ii| can be obtained by 



transforming the same set of variables (see Section [5] for such a situation), it is natural to choose M*-*-* — 

{Mi%k G {1, . . . ,m}, s G {G^\l/I), ■■■ ,G^\iI- 1)//)}} such that Efc,. M^*^ = M. 

When has a product form (which is the case e.g. when F is a standard d-dimensional Gaussian 
distribution), we can set g = ffi. Then, the strata are equiprobable and Pi{fi) — 1/1™ for any (i, fi). 

It is out of the scope of this paper to prove the convergence of this algorithm and we refer the reader 
to classical treatises on this subject. The above algorithm provides, at convergence, both (i) "optimal" 
directions of stratification and an estimate of the associated optimal allocation; (ii) an averaged stratified 
estimate £. By omitting the step[2el the algorithm might be seen as a mean for computing the stratification 
directions and the associated optimal allocation, and these quantities can then be plugged in a "usual" 
stratification procedure. 



5 Applications in Financial Engineering 

The pricing of an option amounts to compute the expectation E [S(Y)] for some measurable non-negative 
function E on M.'^, where V is a standard d- multivariate Gaussian variable. The Cameron-Martin formula 
implies that for any i/ £ R*^, 

E[S(Y)] = E^S{Y + u) expi-u'^Y -0.5u'^u)j , (17) 

The variance of the plain Monte Carlo estimator depends on the choice of v. In all the experiments below 
(except for the |^ estimator), we use either (/)(y) — S(y) (case = 0) or (/)(y) = S(y -f v^,) exp{—u^y — 
O.bv^VT^) where Vi, is the solution of the optimization problem 

argmax{^gj{'',H(i.)>0} ^In E{u) - 0.5u'^ , (18) 

(case u — Vi,). The motivations for this particular choice of the drift vector u and procedures to solve this 
optimization problem are discussed in (jSj). 

We apply the adaptive stratification procedure introduced in Section|4] (hereafter referred to as "AdaptStr") 
in the case m = 1. For comparison purposes, we also run the stratification procedure proposed in ^ (here- 
after referred to as " GHS"), combining (i) importance sampling with the drift Vi, defined in (|18|) . and (ii) 
stratification with proportional allocation and direction /ig defined in (Q, Section 4.2). 

We also run stratification algorithms with three different directions of stratification: the vector /i^ oc 
the vector ^reg proportional to the vector of linear regression of the function 4>iX) on Y (these regression 
coefficients are obtained in a pilot run), and a vector /i; which is a simple guess specific to each application. 
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For these three directions, we run stratification with proportional allocation (case "5,;" set to "prop") and 
with optimal allocation (case "5^" set to "opt"). We also run the plain Monte Carlo estimator (column 
"MC"); when used with v = u*, "MC" corresponds to an importance sampling estimator with a drift function 

Finally, we compare these stratified estimators to Latin Hypercube (LH) estimators (see Q), for a 
description of this method). For Y a standard normal vector in M'^, the expectation of interest E[(/>(y)] is 
also equal to E[(;6(Oy)] for any orthogonal matrix O G R'^^'' but the variance of the LH estimator associated 
with the variable (f}{OY) depends on the choice of O. Unfortunately, it is very difficult to compute explicitly 
the asymptotic variance of LH estimators and therefore to adapt the matrix O; see (|l3h . Since LH somehow 
consists in stratifying each canonical direction, choosing the first column of O equal to the stratification 
direction /j, should be sensible. In our numerical experiments, we consider such matrices O obtained by 
orthonormalization of the basis combining fi and the d — 1 last vectors of the canonical basis of R'' with fi 
equal to /x*, Hreg or to the adaptive stratification direction obtained by our algorithm AdaptStr. 



5.1 Practical implementations of the adaptive stratification procedure 

The numerical results have been obtained by running Matlab codes available from the authors In the 
numerical applications below, ?n = 1. We choose g — so that the strata are equiprobable (piifJ.) = 1//). 
We choose / — 100 strata and Af = 20 000 draws per iterations. 

The drift vector ly that solves (|18p is obtained by running solnp, a nonlinear optimization program 
in Matlab freely available at |http : //www . Stanford . edu/^yyye/matlab/ [ The direction ^(") is set to the 
unitary constant vector (1, • • • , l)/\/d; the initial allocation is proportional. Exact sampling under 

the conditional distributions P(F G ■{¥ G S^(t) ;) and P(F G -[[^^'^l^F = s) can be done by linear 
transformation of standard Gaussian vectors (see (Q, section 4.3, p. 223)). The draws in step 2(a)i and 



2(a)ii can be obtained by transforming the same set of Af'*' Gaussian random variables {VLj < M^*\ i G 



{1, • • ■ , 7}}. Therefore, the total number of d-dimensional Gaussian draws by iteration is M (the estimates 
of i'i{h, fx) and V fiViih, fi) are not independent); M uniform draws in (0,1) are also required to sample 
under the conditional distribution P(y £ ■\Y £ S^(f) ;). The criterion is optimized using a fixed stepsize 
steepest descent algorithm (the stepsize is determined using a limited set of pilot runs). 



5.2 Assessing efficiency of the adaptive stratification procedure 

We compare the averaged stratified estimate £.^^^ obtained after A*' — 200 iterations, with different strat- 
ification procedures and with the crude Monte Carlo estimate. We report in the tables below the estimate 
of the option prices and the estimates of the variance of the estimator obtained from 50 independent 
replications. 

The comparison of the procedures relies on the variance of the estimators. The column "MC" is an 
estimate of the variance of 4>iX) computed with MN i.i.d. samples of a d-multivariate gaussian distribution. 
In the case v = Q, this is an estimation of the variance of the plain Monte Carlo estimator; when u — 1/*, this 
corresponds to an estimation of the Importance Sampling estimator (with importance sampling distribution 
equal to a standard Gaussian distribution centered at i/i,). The column "AdaptStr" is the limiting variance 
per sample of £^^^ which is equal to 




(+oo)' 



when each iteration t £ {1, • • • ,N} implies M draws (see the algorithm in Section |4|; note that by defini- 
tion of our procedure, the allocation is optimal. The column "GHS" is an estimate of X]iPi''"F(Mg) computed 
with MN samples; note also that by definition of this procedure, only the case v = Vi, and the proportional 
allocation has been considered. The columns "fireg" , "fi*" , and "/^/" report the results for the stratification 
procedures with these directions of stratification: the rows 'proportional allocation' report an estimation 
of X]iPi''"f(M) computed with MN samples (for fi G {/ireg. A**, Mi} ^'Hd v G {0,i^*}). We also consider the 
results for the optimal allocation, and to that goal we estimate the standard deviation within each stra- 
tum by an iterative algorithm - with N iterations - : the rows 'optimal allocation' report an estimation of 



^ These codes are freely available from the url |http: //www. tsl ■ enst . f r/~gf ort/| 
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Fig. 1 [left panel] Asian Option when (u, K, u) = (0.1, 45, i^*): drift vector and directions fi^^^ , fig, f^icg and fn . 
v^, has been scaled to have norm 1 {Vi, *— Vt,/0A2). [right panel] Basket Option when {c,K,u) = (0.1,45,;^*): drift 
vector i/i, and directions /i'^', fJ,g, /ij, /^reg and i/* has been scaled to have norm 1 [i/i, <— i/*/0.41). 

'|St^l ^EiPi I where i < /} is an estimation of the standard deviation of the 

strata computed with a total number of M draws allocated to each stratum according to the optimal allo- 
cation computed at the previous iteration {t— 1) (the allocation at iteration is the proportional one). For 

Latin Hypercube samplers, the total number of draws (MN) are allocated to generate A'^ i.i.d. estimators 
(k) 

£\j , k £ {1, • ■ • , A'^}, each based on a Latin Hypercube sample of size M. The estimate LHS is the average 
of these A'' estimators; we also report the variance equal to AI I SfcLi [^•1/'']'^ ^ {A^^^ Sj^l ^1/''}^ }• 



5.3 Asian options 

Consider the pricing of an arithmetic Asian option on a single underlying asset under standard Black- 
Scholes assumptions. The price of the asset is described by the stochastic differential equation — 
r dt + vdWt , Sq = So, where {Wt,t > 0} is a standard Brownian motion, r is the risk-free mean rate 
of return, v is the volatility and sq is the initial value. The asset price is discretized on a regular grid 
— tQ < ti < ■ ■ ■ < — T, with ti iT/d. The increment of the Brownian motion on [ti-i,ti) is 
simulated as jdYi for i G {1, • • • , d} where Y = (Vi, • • • , Yj) ~ M^i^, Id). The discounted payoff of a 
discretely monitored arithmetic average Asian option with strike price K is given by S{Y), 



E{y) = eM-rT)\4yei,p\{r~Q.^v')—^vJ-yyA-K\ , y = (yi, • ■ • , J/d) G 




where for a; £ R, x+ = max(a;,0). In the numerical applications, we take sq = 50, r = 0.05, T = 1, 
(w, K) G {(0.1, 45), (0.5, 45), (0.5, 65), (1, 45), (1, 65)} and d = 16. We choose /ij oc (d, d - 1, • • • ,1). 

We run AdaptStr when {v,K) = (0.1,45): on Figure [l] the optimal drift vector i/^, the direction ^^^^ 
obtained after N iterations of AdaptStr, and the directions of stratification ^g,/irog,M; plotted. In 
Figure (2] the successive directions t ^ fJ-^^\ the successive estimations of the quantity of interest 1 1—^ f*-*-* 
and of the variance t i-» (X^iPl'^'i*')^ ^^e displayed. We observe that {/^ t > 0} converges to the direction 
^g, and the convergence takes place after about 30 iterations. We find the same pattern for a wide range 
of parameter values. The choice of the stratification direction has a major impact on the variance of the 
estimate f as shown on Figure [2] [bottom right]. Along the iterations of the algorithm, the variance 
decreases from 0.1862 to 0.0016. We also observed that the convergence of the algorithm and the limiting 
values were independent of the initial values (pi^"-' , M^^'^ ) (these results are not reported for brevity) . These 
initial values (and the choice of the sequence {'y^^\t > 1}) only influence the number of iterations required 
to converge. AdaptStr can also be seen as a procedure that computes a stratiflcation direction and provides 
the associated optimal allocation. These quantities can then be used for running a (usual) stratiflcation 
procedure with M draws and for the optimal allocation. By doing such with M = 20 000, we obtain an 
estimate of the quantity E[0(y)] equal to 6.05 and of the variance equal to 0.002/M. We can compare these 
results to the output of GHS: this yields the same estimator of E[(;/)(y)] and a larger standard deviation 
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6.056 




Fig. 2 Asian Option when (v, K, v) = (0.1, 45, Ui,). [left panel] successive directions of stratification t i-^ . /i^") 
is proportional to the vector (1, ■ ■ ■ ,1) so that the d curves start from the same point l/\fd. By convention, the first 
component of /x'*' is positive, [top right] successive estimations of the quantity of interest t i— > [bottom right] 

successive values of the variance i 1-^ (EiPi o--'')^; the limiting value is 0.002. 

equal to 0.014/Af. Observe that since /i*-^-* — /ig, the two algorithms diflter from the allocations in the 
strata. 

We conclude this study of AdaptStr by illustrating the role of the drift vector v (see Ea. ll7p . We report 
on Figure|3]the limiting direction fj,^^\ the estimates t >—> and the variance t >—> (X^i Pi'^'i*^)^ when v = 
0. The limiting direction /x^^^ slightly differs from /Xg and is close to i/*. Moreover, the variance reduction 




Fig. 3 Asian Option with (v, K, u) = (0.1, 45, 0): [left panel] the limiting direction ^'^^ and for comparison, fig and 
Vi, normalised to have norm 1 (i^* j^*/0.42). [top right] successive estimations of the quantity of interest t i-^ E^^\ 
[bottom right] successive values of the variance t ^ (XliPi o--'')^: the limiting value is 0.004. 

is weaker: the limiting value of f (X^iPi'^'i ) is 0.004. The efficiency of the adaptive stratification 
procedure AdaptStr is thus related to the drift vector v in (|17[1 : similar conclusions are reached in (0) (see 
also (0)). 

We report in Tables [1] and [2] the variance of different estimators, as described in Section [5.21 

Insert TafcZesQ] and\^ about here 

Consider first the case = 0. When the volatility of the asset is low ^; = 0.1 and the strike is in-the-money, 
the performance of the adaptive stratification estimator "AdaptStr" and of the stratification estimator 
with fixed direction ^rcg, ^-nd /x; and with optimal allocations are equivalent. We observe indeed that 
the directions n^^\fiicg, and ni axe almost colinear. Compared to the plain Monte Carlo, the variance 
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reduction factor is equal to 2500. The LH estimator with a rotation along any of the directions and 
Hreg outperforms all the stratified estimators: the variance reduction is by a factor 10500. This reduction in 
the variance strongly depends upon the choice of the rotation: the LH estimator with no rotation implies 
a variance reduction by a factor 150. 

When the volatility of the asset is high v = 1, the conclusions are markedly difi'erent. Consider e.g. 
the case when the option is out of the money {K = 65). The adaptive stratification estimator "AdaptStr" 
provides a reduction of variance by a factor 150, which is again similar to the variance reduction afforded 
by the stratification with fixed directions /ircg, fJ'* and /i;, and optimal allocation; AdaptStr outperforms 
stratified estimators with any of the fixed direction /^reg, fJ-* or fn by a factor 13 when allocation is 
proportional. The LH estimator with no rotation only provides a reduction in variance by a factor 1.7; 
when the rotation along /i^^^ is applied, the reduction is by a factor 65. Here again, the LH estimator is 
very sensitive to the choice of the orthogonal matrix O. 

The use of the drift u = v* improves the variance of all the stratified estimators by a factor 2 to 
10, depending on the choice of the stratification direction; and by a factor 10 to 25 for "MC". Here again, 
"AdaptStr" is the best stratified estimator; its performance can be approached by stratification estima- 
tors with fixed directions, but the choice of this fixed direction depends crucially upon the values of 
the volatility and the strike. The vectors jJ-^^ and /Xreg are almost colinear in many cases e.g. when 
{v,K,u) — (0.1; 45; //^), but not always as observed in the case {v,K,v) = (l;65;i/^) from the variances 
given in Table [T] It is interesting to note that the use of the drift Vi, does not always improve the variance 
of the LH estimator. 

These experiments show that the choice of the stratification direction and of the allocation is crucial. 
For example, in the case {v,K,v) = (0.5,65,0), the adaptive stratification estimator improves upon the 
stratification estimator with fixed direction (1, • ■ • , 1)/Vd and optimal allocation by a factor 60 (and by a 
factor 190 when proportional allocation is used) - these results are not reported in the tables for brevity 
since this direction is rarely optimal - . Even if simple guesses for the direction reduce the variance, this 
reduction can be improved (by a factor 20) when optimal allocation is used; this allocation is unknown 
and has to be learnt. In these examples, LH outperforms in many cases stratification procedures provided 
it is applied with a rotation O: the rotation along fi"^^ outperforms LH with no rotation and when 
compared with other simple guess rotations, it provides similar or better variance reduction. All these 
remarks strongly support the use of adaptive procedures. 



5.4 Options with knock-out at expiration 

A knock-out barrier option is a path-dependent option that expires worthless if the underlying reaches a 
specified barrier level. The payoff of this option is given by 




-(y) = cxp(-rr) l^^^exp |^(r-0.5aO— + - A' j l{Sr{y)<B} 

where K is the strike price, B is the barrier and Sxiu) is the underlier price modeled as 

Sriy) = So exp I (r - 0.5(7^ )r + ay 




In the numerical applications, we set sq = 50, r — 0.05, T = 1, a — 0.1, d = 16 and {K,B) £ 
{(50, 60), (50, 80)}. We choose oc {d,d — 1, - ■ ■ ,1). On Figure IHleft panel], we plot fi^-^^ in the case 
{K,B,v) = (50,60,0); the limiting direction is /ij. On Figure |4[right panel], we plot /x(^) in the case 
{K,B,v) — (50,60,1^*); for comparison, we also plot Vi,, fig, /ireg and fn. This is an example where the 
optimal stratification direction /i*-^' does not coincide with the different directions of stratification (/ig, 
/Xreg and fii); in this example, /ig ~ /i; but the optimal direction of stratification is close to (1, • • • , l)/\/d. 
The direction associated to the regression estimator is far from being optimal. 

We report in Tables |3] and |3] the variances of the different estimators, as described in Section [5.21 

Insert Table\^ and^ about here 

Consider first the case where the drift v is set to 0. When (K, B) = (50, 60) (the option is at the money, 
and the barrier is close to the money), the adaptive stratification provides a variance reduction by a factor 
10 with respect to the plain Monte Carlo estimator. In this case, the stratification directions fii and /x* 
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Fig. 4 Barrier Option when (K,B) = (50,60) and [left panel] i/ = or [right panel] v = Vi,: directions /^'^^ Mreg, 
/i;, v^, and ^g. Vi, has been scaled to have norm 1 [Vi, <— Vi./0.84) 



(with an optimal allocation) perform almost as well (and ^; and ^* are close to [v ^ at the convergence), 
while /^reg provides a higher variance. For the LH estimator, the variance reduction is only by a factor 1.5. 
It is worthwhile to note that the best choices for the rotation , /i*-^^ and /i^,, lead to a variance thrice the 
one of "AdapStr". The use of the drift vector Vi, improves the variance of the adaptive stratification by 
a factor 1.8; the optimal stratification vector is now which surpasses /ircg. The variance of the LH 

estimator is also reduced. 

When {K,B) = (50,80) (the barrier is out of the money), a factor reduction 2800 is obtained by the 
adaptive stratification estimator "AdaptStr"; a similar variance reduction is achieved using the stratified 
estimator with direction /ircg and with optimal allocation. For the LH estimator, the variance reduction is 
by a factor 2000, when the rotation is /ircg. The use of the drift vector Vi, improves the behavior of all the 
algorithms: the variance of "AdaptStr" is reduced by a factor 3.8. Finally, LH with rotation /i^^' reduces 
the variance of LH with no rotation by a factor 1200. 

To conclude, this example shows again the interest of adaptive procedures in order to find a stratification 
direction, the optimal allocation or a rotation in LH. 



5.5 Basket options 

Consider a portfolio consisting of d assets. The portfolio contains a proportion qj. of asset k, k £ {1, . . . , d}. 
The price of each asset is described by a geometric Brownian motion (under the risk neutral probability 
measure) 

rfS.^''' (k) 
—^=rdt + VkdWP 

but the standard Brownian motions {Wj''^\ k G {1, . . . , d}} are not necessarily independent. For any t > s 
and k £ {1, . . . ,d} 

InS^t''^ = InSf ^ + (r - 0.5vl^ (t ~ s) + v^Vt^Yk 

where Y = (Yi, . . . , Yj) ~ J^d{0, The d x d matrix 17 is a positive semidefinite matrix with diagonal 
coefficients equal to 1. Therefore, the variance of the log-return on asset k in the time interval [s,t] is 
(t — s)v\, and the covariance between the log-returns i,j is it — s)viVjSij . It follows that Eij is the 
correlation between the log-returns. The price at time of a European call option with strike price K and 
exercise time T is given by £[^(1^)] where 

S{y) = exp(-rT) ("^ a^s^ exp ((r - 0.5vl)T + v^VTy^) - A 

\k=l / + 

and y = \/Sy {^/S denotes a square root of the matrix S i.e. solves MM'^ = S). In the numerical 
applications, S is chosen to be — l{i=j} + cl|j-^j}, Ofj, = 1/d, r — 0.05, T = 1, and d — 40. We 
consider {c,K) £ {(0.1, 45), (0.5, 45), (0.9, 45)}. The initial values {sg,fc < d} are drawn from the uniform 
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distribution in the range [20,80]; the volatilities < d} are chosen linearly equally spaced in the set 

[0.1, 0.4]. The assets are sorted so that vi < • ■ ■ < v^. We choose 

W oc [ais^Q^ exp((r - 0.5u?)r)wi, ■ ■ • , 0^4''' exp((r - G.^v^)T)vdj . 

In the case {c,K,u) = (0.1, 45, z^*), we plot on Figure [ijright panel] the limiting direction /j,^^^ and for 
comparison, the directions v-t, Hg, Urcg, fJ-i- We report in Tables [5] and [6] the variance of the different 
estimators, as described in Section [5.21 

Insert Tables\M and\Si about here 

In this example again the adaptive stratification estimator improves upon the best stratified estimator with 
(non-adaptive) stratification direction and optimal allocation. Here again, the optimal allocation improves 
the variance reduction, by a factor 15 for example in the case (c, K, v) = (0.9, 45, 0) for the fixed directions 
Mrcg, /iit or ^i. It is interesting to note that the variance reduction with respect to the plain Monte Carlo 
using "AdaptStr" ranges from 100 (c = Q.l,K = 45) to 2500 (c = 0.9,/s: = 45) whereas the use of the drift 
u* allows only a reduction by a factor 10. The choice of the stratification direction plays a more important 
role than the choice of drift direction. 

The comparison with the LH estimator is more difficult, because this estimator behaves totally differ- 
ently from the stratified estimator. First, except for c = 0.9, the use of the drift v^, increases the variance; 
whereas the effect of the drift is always markedly beneficial for the Monte Carlo estimators, the drift v^, 
may increase the variance by a factor as large as 15 (when {c,K) = (0.1,45)). Second, LH with rotation 
always outperforms (adaptive) stratification: the main difficulty stems from the choice of the rotation but 
the obtained results show that rotation along /i*-^^ provides either the maximal variance reduction or 
variance reduction similar to the best rotation among the three considered. Finally, the performance of 
the estimator is extremely sensitive to the choice of the simulation setting: when the correlation among 
the assets is large c = 0.9, the choice of the first vector of the orthogonal matrix O becomes crucial. With 
drift Vi^, rotation along /i*-^' (resp. ^rcg) improves LH with no rotation by a factor 22500 (resp. 3000). 



5.6 Heston model of stochastic volatility 

We consider a last example which is not covered by the methodology presented in (Q). We price an Asian 
option in the Heston model, specified as follows 

d^t = k{9 - ^t)dt + a^tdWt\ dSt = rStdt + y^t St {pdW} + ^J\-i?dW'l), dXt = Stdt 

where {Wt,t > 0} and {Wf,t > 0} are two independent Brownian motions, r is the risk free rate, a > 
the volatility of the volatility process ^, fc > the mean reversion rate, 8 > the long run average volatility, 
and p G [^1,1] the correlation rate. The price of an Asian Call option with strike K and maturity T is 



E 



exp(-rT) ( 



(19) 



In our tests, we have chosen the parameters so that (t^ < AkO. This enabled us to replace by Gaussian 
increments the finitely-valued random variables used to discretize in the scheme proposed in |3) to 
approximate the SDE satisfied by {St,Xt,^t)- We refer to (jj) for a precise description of the scheme that we 
used. The resulting approximation of Xj' is generated from a vector Y — (Yi, . . . , Y^^, l^-i-lj ■ • ■ i ^2^) ~ 
■^2d{0j Id) corresponding to the increments of {W^ , W'^) and a vector B = {Bi , . . . , B^) of independent 
Bernoulli random variables with parameter 0.5. The price (|19|l is then approximated by 



E[exp(-r-r) (^Xd-A'jJ. 

In the following tests we keep v in (|17|l equal to zero and do not stratify the random vector B. 

We choose So = 100, 6 = 0.01, k = 2, a = 0.2, T = 1, r = 0.095, p = -0.5 and {^o,K) € 
{(0.01, 120), (0.01, 100), (0.01, 80), (0.04, 130), (0.04, 100), (0.04, 70)}. The discretization step of the scheme 
is d = 50. 

On Figure [5] we plot the successive estimations of the variance t i-^ Cl^i Pi^-*^ j when (Co, A") = 
(0.01; 120) and {^o,K) = (0.04; 70). 

We plot on Figure[6]the components of /i'-^' with respect to the component index in the cases (^o, K) — 
(0.01; 120) and {^o,K) = (0.04; 70). 

We report in Tables [7] and [8] the variances of some estimators described in Section [5.21 
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Fig. 5 Asian Call Option in Hcston model: successive estimations of the variance t i-^ (JZiPi o-.'*^)^ [left panel] 
when iio,K) = (0.01; 120); [right panel] when (^o, K) = (0.04; 70) 
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Fig. 6 Asian Call Option in Heston model: vectors /i^^' and /Jrcg in the cases [left panel] (^o,^) = (0.01; 120); 
[right panel] i^o,K) = (0.04; 70) 

Insert Table^ and\Bi about here 

The first observation is tiiat even in tiiis case, AdaptStr still works and provides variance reduction 
when compared to Monte Carlo. It is all the more efficient than the option is out of the money : when 
{^OjK,!/) — (0.01,120,0), the variance reduction is by a factor 85; when {£,OyK,u) — (0.04,130,0), the 
variance reduction is by a factor 105. AdaptStr and stratification with fixed direction ^reg are equivalent, 
provided the last one is applied with optimal allocation, just necessitating again iterative procedures. 

We can wonder on the effect of the moneyness and the volatility of the model on the variance reduction. 
As shown in Table [T] in general the achieved variance reduction is larger when the option is out of the 
money (for K = 130 and = 0.04 the variance is divided by nearly 105 when using AdaptStr). We also 
observe that stratification procedures outperform LH samplers when the option is out of the money, but 
LH is equivalent to stratification when the option is in the money. 

6 Proofs 

6.1 Proofs of Section 13] 

In the sequel, we denote Im {1, • • • ,/}™ and piai \j Jg. dA^ ^ /g. (^/^ - ^ /g. t/)^/^ 
in place of pi[^)ai{^). 

Lemma 1 Let {S;, i G Xm} be given by (gl). 
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(i) V6 > 0, VM > 6-1, supg,„f^^^^^ ,^>, |m<;,2m(m,5, Q) - Eiei„, ^| < a,,(,_V-M Var[0(y)]. 
(a) Assume that essinfg.;^ (x/ff) > '^'^^ esssup^._>^ (/m/x) < +oo. Let e > 0. For any (I,M) such that 
Af/-"essinfg.;,(x/5) > 1 + e 



Mc|.m(m,5,Qx)- I] 



2 2 
Pi c^i 



9i(x) 



essmfg.A (x/fl) M \ x V 



X ) " essinf „.A (x/s) 



('mj for any positive integers M, I and real e > 1, 



„2 2 / rm 1 

M,lM{l^,g, Q*(/i)) - ^ < Var[0(y)] (1 + + -_ 

where Q*(^) = {gi'(^),i G Xm} is t/ie optimal allocation defined by 

Proof By definition of Mi (see Eq. H}, Mi = when gi = and M; > 1 when > . One may have 
Mi = 1 when gi G {Q,M~^) but then MMr^ < q7^ . Hence, 



iel,„:9i>0 



Pi <^i 



< E 

ieIm,gi>l/J\f 



Mqi - Mi 



A/,- 



2 2 
9i 



E 



2 2 



(20) 



iel™:0<gi<l/j\/ 

('i^ When inf jgj^ gi > e > A/"!, the second term in the rhs is null and since by (|4]|, A/gi — 1 < Afi < Mgi + 1, 



E 

iGl„,,gi>l/Jl/ 



Afgi - A/; 



A/i 



2 2 

^ < M-1 



sup Pig; M ^ (gi - M 1) 1 piO"?, (21) 
ieI„,gi>l/M J ieX,„,gi>l/M 



which yields the desired result upon noting that pig. "'^ < gi ^ < e ^ and X^iPi''"? — Var[(;/)(y)]. 

(ii) Under the stated assumptions, gi(x) = X^-^ ^ essinfg._\ (x/p) Hence A/gj > 1 + e 

which implies that the second term in the rhs of (|20|) is null. This also implies that q^ — M^^ > 
^1 — YTe^ essinfg.^ (x/ff) L"'". We conclude the proof by combining this bound with (|2ip and the following 
one : 

Pi Is /m 1 /™ 

— TT = V TT ^ esssup (/^/x) A — ^ < esssup (/^/x) A — — j-^— . 

ft(x) Js.xdX x-A ftlX) x-A essmfg.A (x/s) 

('mj Note that by convention, pfa? /qi(p) = when qid-i) = 0. By definition of the optimal allocation 
(see Eq.Q, 



Pif^i/qtifJ-) = liifJ-) I^EPj'^J j - ya.r[(f){Y)] . 

The second term in the rhs of (|20|l is upper bounded by I"^M~^ Var[0(y)]. For the first term. 



[Var[<^(y)]]-l ^ 

ieI„,g?(M)>l/M 



A/g^(/i)-Afi p?ai2 



gr(M) 



< E 

iGl„,l/j\/<g? (Ai)<e/Af 



Mgr(.)-.^i. ^.(^^ 



Afi 



E 

ieX,„,(;i*(M)>e/J>/ 



Mqt(p) - A/i 



A/i 



For all i such that g,*(/i) > 1/A/, A/; l|A/g,*(^i) - A/i| < 1 which implies that 



E 

iGX™,l/A/<(Ji*(Ai)<c/M 



A/gn/i)-A/i 



M: 



For aU i such that g,^(^) > e/A/, Mr^\Mqf^{^i) - Mi\ < Mr^ < {Mql (fi) - < (e- 1)-^ which implies 
that 



E 

iGX„,,g|'(Ai)>e/M 



Mg.^(M)-Mi , ^.-1 
Qi W < (e - 1) 



A/i 
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Proof of Proposition Q] To prove the Proposition [TJ we need the two following Lemmas. The first is a 
standard change of variables formula (see for example, 0, Theorem 4.1.11)). Define G~^{ 
{G^^{xi), . . . , G^{xm)) where is the c.d.f. associated to the density pj. on R. 

Lemma 2 Let h : R™ — > R 6e a measurable function. Assume that h is nonnegative or is such that 
/ijm \^\^{g>id} '^^ < +°°- Then, for all < v^. < wj^ < 1, k € {1, . . . , 1} 

[ h^\q>a\d\=[ -oG^^dX. (22) 

The second technical Lemma is our key approximation result. 

Lemma 3 Let h, 7 : R™ ^ R 6e functions such that Jjg„ ^/i^ + 7^^ /g dX < +00. Define for i G Tm, 
r/ien lim/_+oo EieX,„ i-Ri['^.7]| = 0. 



/i dA ) ( / 7 dA ) . (23) 



Proof By polarization, it is enough to prove the result when 7 = ft with /g dX < +00. This 

integrability condition ensures that A-a.e. , g — implies h = and by (|22p. one has 

Ri[h,h]= [ ^oG~^dX-r{[ ^oG-^dx] , 

•^nr=i[(H--i)A^»'./-fi 9 V-^nr=i[(»'=-i)/-f.H-/^i s / 

where the right-hand-side is non-negative by Cauchy-Schwarz inequality. Set h{u) ^(G~^(it)) if u G 
(0, 1)™ and otherwise. By (|22|) and the integrability assumption made on h, the function h is square 
integrable on R"*. Using the definition of h for the first equality and symmetry for the second one, one has 

//"'" r 
h{u){h{u) - h{v)}dudv = ^ / {h{u) - h(v)}'^ dudv 

= — V / / {h{u) - h{u + w)}^ dwdu < - [ f {h(u) - h{u + z / I)f dudz . 



where we have set, for i G Tm, e7i = IlfcLi [(^fe ~ 1) /^i By continuity of the translations in L^(R'", du) 

and Lebesgue's Theorem, one obtains that the right-hand-side converges to as J —> 00. 

We now proceed to the proof of Proposition [T] Under A[T1 it holds that 

ft(x) > f essinf {x/g)\ f g dX = essinf (x/g) ■ (24) 

2 2 

Hence, by Lemma[ll|I]) , to prove the first assertion, it is enough to check that lim/_+_|.(xj /}m ^T(^ ~ 

?aD(M,x)- By definition of i?; (see Eq. ([23}), 

P? 'T? _ 4/m(Cm-^')/5 rfA-i?i[/^,Cp^]+J?i[7/,^/^,,/,^/^] 



gi(x) Is, X dX 

2 , , ^ k- /'(Cm - i'Dia dX ~ Rilx, /2(Cm - v^)/x] 



iej„ 



/" X dA 



Therefore ^ 44-d(/^,x)= E ' ^"^^^^^ + ^./.l - i^iLf.. CM 



and one easily concludes with (|24|l and Lemma |3] (which applies under i^and A[3]). The second assertion 
is a consequence of Lemma [Tl|n]) . 



18 



Proof of Proposition Since for a,b > 0, \^/a — < ^\a — b\, one has 



E 

iex,„ 



PiO-j 



Si iSi 



V-'m /p dA 



1/2 



1/2 



1.2 \ 



Under J f^i^^fi ~ i^fi)/9 dA < +oo, and by Lemma[3l the right-hand-side converges to as 7 +oo. 
Therefore, 



^hm^ E Pi-i- 



Si 



dA 



: 



(25) 



We now write 



f^i\lc,t,-ipl dA 


) E ki(xM)- 


- ft (m) 












< E 


E ^'j'^j - / 




dA 


+ E 




ft^^JCt^-i'f, 





By Ea. (|25|l . the rhs tends to zero as / H-oo. The second assertion is a consequence of Lemma [TJm]) 
apphed with e = /J™ and of Eq. ([25}. 



6.2 Proofs of Section H 

We only give the proof of Proposition [3] and refer to (0) for the one of Corollary [1] 

Proof of Proposition [21 Let H £ R'* be such that \H\ < ei = j^, a — {H, ei), b — \H — aei\ and 62 be 

equal to ^^^'^^ if & 7^ and to any vector with norm 1 orthogonal to ei otherwise. We complete (ei, 62) 
with (63, . . . , e^;) to obtain an orthonormal basis of R^. For a G R"^, q^, = (a, e^,). 



gzifJ. + H) - gzifJ,) ^ / _ ^ h{a)da- h{a) da = j j ' h{a)daida2:d 



Rd-i Jo \ ImI + as 



ei H- 2^ Qfeefc 



k=2 



dsda 



2:d 



h z 



_ + as)ei + bse2 
iKd-i '" I" (l^il + as)2 -f (fes)2 



+ Q2 



zbs 



(ImI +as)2 + (bs)2y \ii\+as 
{y,H) ^^^+sH^ 



+ E "fe^*: 

A:=3 

(1^1 + as)e2 — fesei '\ az -I- a2^'lA'l , , 

i^da2:dO'S 



(ImI + as) 



.0 . ^^'^\, + sH\ 



where, for the last equality, we made the change of variable 



(26) 



V(IH+«s)2 + (bs)2 ^ 

P2 = T- j— OL2 

-I- as 



zbs 



as) 7(|M|+as)2 + (&s)2 ' 
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used the equality + as)ei + bse2 = fi + sH and remarked that + sH,y) = z imphes that az + 
(y, 62) h\p\ = + as) {y, H). Define, for v G 7(/i, i/) / y^fe(j/)dA^. We deduce that 

g,{lx + H)-g,{ii) + (H, j ^^h{y) d\ii^ ^ I^H, j\^{h, 11) ~ ^{h, 11 + sH)} ds^ . 

Consider now the following decomposition 7(/i, v) — ^ + 7 ^/il||.|<j^/-^, . Under assump- 

tion (jlSp . the first term in the rhs is arbitrarily small as M goes to infinity uniformly in u close to /i. When 
V ^ ^, the measure 1{[.[<m}-'^z converges weakly to 1{|.|<a/}A2; hence, the second term converges to 

7 ( 'il{[.[<j\/} , A*) • Therefore, the function v 1— > 'y{h,v) is continuous at ^ and the conclusion follows easily. 
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Model 




Price 


Variance 


V K 


V 


1i 


- 


MC 


AdaptStr 


GHS 


Mrcg 




Mi 


0.1 45 





prop 


6.05 


8.640 






0.017 


0.016 


0.017 






opt 


6.05 


8.640 


0.004 


_ 


0.005 


0.004 


0.004 




Vi, 


prop 


6.05 


0.803 


_ 


0.014 


0.014 


0.008 


0.007 






opt 


6.05 


0.803 


0.002 




0.005 


0.002 


0.002 


0.5 45 





prop 


9.00 


158.1 






2.086 


2.128 


2.243 






opt 


9.00 


158.1 


0.352 


_ 


0.362 


0.371 


0.390 




V-k 


prop 


9.00 


14.95 


_ 


0.203 


0.225 


0.324 


0.221 






opt 


9.00 


14.95 


0.147 




0.162 


0.223 


0.162 


0.5 65 





prop 


2.16 


48.41 






1.857 


1.859 


2.097 






opt 


2.16 


48.41 


0.093 


- 


0.096 


0.096 


0.147 




V* 


prop 


2.16 


2.32 




0.039 


0.046 


0.048 


0.049 






opt 


2.16 


2.32 


0.020 




0.024 


0.025 


0.026 


1 45 





prop 


14.01 


852.0 






52.24 


54.51 


57.72 






opt 


14.01 


852.0 


5.39 




5.48 


5.69 


6.02 






prop 


14.01 


42.76 




3.014 


3.185 


4.360 


3.265 






opt 


14.01 


42.76 


2.270 




2.400 


3.220 


2.450 


1 65 





prop 


7.79 


587 






50.9 


50.5 


55.8 ; 






opt 


7.79 


587 


3.75 




3.01 


3.08 


3.95 




v* 


prop 


7.78 


22.34 




1.55 


1.75 


2.01 


1.56 






opt 


7.78 


22.34 


0.99 




1.14 


1.31 


1.00 



Table 1 Asian Option: Monte Carlo and stratification 



Model 




Price 






Variance 




V 


K 


V 




Latin 


Latin +Rot /irog 


Latin +Rot /x* 


Latin +Rot 


0.1 


45 





6.05 


0.0596 


0.0008) 


0.0008 


0.0008 








6.05 


0.6000 


0.0063 


0.0009 


0.0003 


0.5 


45 





9.00 


35.55 


0.374 


0.351 


0.385 






I'* 


9.00 


11.72 


0.166 


0.242 


0.137 


0.5 


65 





2.16 


27.55 


0.152 


0.135 


0.147 








2.16 


2.00 


0.043 


0.037 


0.033 


1 


45 





14.00 


357.70 


10.86 


9.84 


12.20 








14.00 


36.25 


2.35 


3.25 


2.10 


1 


65 





7.78 


339.11 


7.94 


7.70 


9.14 








7.78 


19.62 


1.49 


1.34 


1.25 



Table 2 Asian Option: Latin Hypercube 



Model 




Price 


Variance 


K B 


V 


Qi 




MC 


AdaptStr 


GHS 


^rcg 




Mi 


50 60 





prop 


1.38 


2.99 






1.46 


1.13 


1.14 






opt 


1.38 


2.99 


0.31 




0.83 


0.31 


0.31 




V* 


prop 


1.38 


1.34 




0.50 


1.15 


0.49 


0.50 






opt 


1.38 


1.34 


0.17 




1.12 


0.31 


0.31 


50 80 





prop 


1.92 


4.92 






0.016 


0.017 


0.016 






opt 


1.92 


4.92 


0.002 




0.002 


0.002 


0.002 




V* 


prop 


1.92 


0.704 




0.0011 


0.0012 


0.0013 


0.0011 






opt 


1.92 


0.704 


0.0005 




0.0006 


0.0006 


0.0005 



Table 3 Barrier Option: Monte Carlo and stratification 



Model 




Price 






Variance 




K B 


V 




Latin 


Latin +Rot ^rcg 


Latin +Rot pn. 


Latin +Rot /i(^) 


50 60 





1.38 


1.98 


(1.5074; 1.5416) 


0.97 


0.98 






1.38 


1.26 


1.01 


0.31 


0.21 


50 80 





1.92 


0.727 


0.002 


0.002 


0.002 






1.92 


0.4501 


0.0005 


0.0006 


0.0004 



Table 4 Barrier Option: Latin Hypercube 



Model 




Price 


Variance 


c K 


V 


1% 




MC 


AdaptStr 


GHS 








0.1 45 





prop 


11.24 


22.16 






0.25 


0.25 


0.25 






opt 


11.24 


22.16 


0.22 




0.22; 


0.22 


0.22 






prop 


11.24 


1.39 




0.26 


0.87 


0.26 


0.26 






opt 


11.24 


1.39 


0.21 




0.65 


0.21 


0.21 


0.5 45 





prop 


11.56 


51.13 






0.37 


0.37 


0.37 






opt 


11.56 


81.13 


0.10 




0.10 


0.10 


0.10 






prop 


11.56 


8.64 




0.08 


0.09 


0.09 


0.08 






opt 


11.56 


8.64 


0.06 




0.07 


0.07 


0.06 


0.9 45 





prop 


12.09 


134 






0.75 


0.74 


0.74 






opt 


12.09 


134 


0.05 




0.05 


0.05 


0.05 






prop 


12.09 


14.46 




0.022 


0.029 


0.024 


0.023 






opt 


12.09 


14.46 


0.008 




0.012 


0.009 


0.008 



Table 5 Basket Option: Monte Carlo and stratification 



Model 




Price 






Variance 




c K 


V 




Latin 


Latin +Rot /.ti-cg 


Latin +Rot ^* 


Latin +Rot /i*^' 


0.1 45 





11.24 


0.08 


0.07 


0.03 


0.03 




V* 


11.24 


1.18 


0.92 


0.05 


0.04 


0.5 45 





11.56 


4.94 


0.02 


0.02 


0.02 




Vi, 


11.56 


6.90 


0.02 


0.02 


0.02 


0.9 45 





12.09 


13.05 


0.007 


0.006 


0.007 




Vi, 


12.09 


12.51 


0.0038 


0.0026 


0.0006 



Table 6 Basket Option: Latin Hypercube 



Model 




Price 


Variance 


«o K 


V 


li 




MC 


AdaptStr 


/^rcg 


0.01 120 





prop 


0.007 


0.0272 




0.0234 






opt 


0.007 


0.0272 


0.0003 


0.0003 


100 





prop 


5.19 


18.06 




2.009 






opt 


5.19 


18.06 


1.700 


1.725 


80 





prop 


22.65 


30.01 




3.16 






opt 


22.65 


30.01 


2.85 


2.93 


0.04 130 





prop 


0.024 


0.152 




0.098 






opt 


0.024 


0.152 


0.001 


0.001 


100 





prop 


6.42 


46.58 




2.37 






opt 


6.42 


46.58 


1.69 


1.68 


70 





prop 


31.74 


88.38 




4.08 






opt 


31.74 


88.38 


3.58 


3.71 



Table 7 Asian Option in Heston model: Monte Carlo and stratification 
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Model 




Price 




Variance 




& K 


u 




Latin 


Latin +Rot /.treg 


Latin +Rot /iW 


0.01 120 





0.007 


0.027 


0.008 


0.009 


100 





5.19 


2.77 


2.08 


2.15 


80 





22.65 


3.65 


2.37 


2.20 


0.04 130 





0.024 


0.15 


0.03 


0.03 


100 





6.42 


7.58 


2.59 


3.22 


70 





31.74 


3.19 


3.81 


3.54 



Table 8 Asian Option in Heston model: Latin Hypercube 



